Skip to content

docs(secrets): scoped-store v1 spec — secrets.<scope>.yaml with per-scope keys#358

Draft
Cre-eD wants to merge 12 commits into
mainfrom
design/scoped-store-v1
Draft

docs(secrets): scoped-store v1 spec — secrets.<scope>.yaml with per-scope keys#358
Cre-eD wants to merge 12 commits into
mainfrom
design/scoped-store-v1

Conversation

@Cre-eD

@Cre-eD Cre-eD commented Jul 3, 2026

Copy link
Copy Markdown
Contributor

What

Implementation spec for the keyless-secrets RFC's Minimal v1: per-scope secret files (secrets.pr.yaml, secrets.staging.yaml, secrets.prod.yaml, …) with per-scope recipient keys, so a CI context holds a key that opens exactly the scope files it needs — replacing the single master SIMPLE_CONTAINER_CONFIG that decrypts everything.

Spec: docs/design/keyless-secrets/scoped-store-v1.md. Highlights:

  • SOPS file-per-scope (per the RFC decision), age recipients in v1 (KMS/OIDC = v2), committed-encrypted with diffable structure.
  • Governance: CODEOWNERS-gated .sc/scopes.yaml is the single scope→recipient surface; sc secrets lint fails on SOPS-metadata drift + plaintext leaks.
  • Deterministic ${secret:} resolution (scope file → legacy store), hard-fail on missing/undecryptable — never a partial deploy.
  • Strict mode-A compatibility: legacy store untouched; scoped files are additive and never opened by old binaries (fail-closed via the shipped schemaVersion guard pattern).
  • Consumer payoff: pull_request jobs (previews, provision-preview) get SC_KEY_PR only and stop receiving SC_CONFIG — closes the biggest remaining PR-reachable secret exposure.

Process

Draft for design review (panel review planned like Phase 1). Implementation lands in this same PR after sign-off — kept as one consolidated PR per maintainer preference.

RFC: #346 (merged). ClickUp: 86caf67c3.

…aml)

Concrete spec for the RFC's Minimal v1: SOPS file-per-scope
(.sc/stacks/<stack>/secrets.<scope>.yaml), age recipients governed by a
CODEOWNERS-gated .sc/scopes.yaml, scope-aware CLI verbs, deterministic
deploy-time resolution with hard-fail-on-missing, plaintext-leak lint, strict
mode-A backward compatibility (additive files old binaries never open), CI
demotion plan (pull_request jobs get a pr-scope key instead of the master
SC_CONFIG). Implementation lands in this same PR after design sign-off.

Signed-off-by: Dmitrii Creed <creeed22@gmail.com>
@github-actions

github-actions Bot commented Jul 3, 2026

Copy link
Copy Markdown

Semgrep Scan Results

Repository: api | Commit: 3ed0f04

Check Status Details
⚠️ Semgrep Warning 1 warning(s), 5 total

Scanned at 2026-07-04 12:17 UTC

@github-actions

github-actions Bot commented Jul 3, 2026

Copy link
Copy Markdown

Security Scan Results

Repository: api | Commit: 3ed0f04

Check Status Details
✅ Secret Scan Pass No secrets detected
✅ Dependencies (Trivy) Pass 0 total (no critical/high)
✅ Dependencies (Grype) Pass 0 total (no critical/high)
📦 SBOM Generated 523 components (CycloneDX)

Scanned at 2026-07-04 12:18 UTC

@github-actions

github-actions Bot commented Jul 3, 2026

Copy link
Copy Markdown

📊 Statement coverage

Measured on the documented included set (see docs/TESTING.md → Coverage scope). Observe-only — no regression gate is enforced yet.

Scope This PR main baseline Δ
Included set (Gold-tier denominator) 89.2% 90.3% -1.1 pp
Full set (whole repo, transparency) 29.0% 28.0% +1.0 pp

Baseline: main @ d64c472

Cre-eD added 11 commits July 4, 2026 00:49
Panel decision (Codex + Gemini + 2 Claude lenses):
- D1 scan-only: v1 scopes the 4 PR scan/lint jobs; deploy-shaped PR jobs and
  crossguard's Pulumi creds stay out (crossguard -> OIDC, never the pr scope).
- D2 SC_KEY_PR interim + committed KMS/OIDC v2 (reuses existing KMS+OIDC infra;
  recipient swap on the same files, stored key retired at v2).
- D3 scope files in the devops parent (integrail) store; resolver still supports
  consumer-repo scopes for the later deploy sweep.

Bakes in the 7 consensus P0s: parent/child merge constraint, real hard-fail
(not swallowed warn), scope-name + path:scope:key AAD binding, sc recipient-verify
lint, preview-deploy exclusion, secretScope PR-clamp, SC_KEY_PR blast-radius bound.

Signed-off-by: Dmitrii Creed <creeed22@gmail.com>
Implements the v1 scoped-secret store on sc's OWN cipher layer (RSA-OAEP +
X25519 sealed box) rather than pulling in SOPS/filippo.io-age — neither is a
current dependency and adding them would be a large supply-chain surface. Value
confidentiality, per-recipient sealing, and AEAD/OAEP associated-data binding
are all provided by the existing ciphers package. Recipients are SSH public keys
(ssh-ed25519 / ssh-rsa), consistent with the whole-file store.

- ciphers: add EncryptLargeStringWithAAD / DecryptLargeStringWithAAD /
  DecryptLargeStringWithEd25519AAD. A nil AAD reproduces the exact legacy wire
  format (RSA OAEP label / X25519 associated data), so the whole-file store is
  byte-for-byte unchanged and all existing cipher tests pass.
- scoped: scopes.yaml governance model (per-scope recipient sets, allow/disallow,
  fail-closed schemaVersion guard) + secrets.<scope>.yaml file format
  (committed-encrypted, structure readable / values opaque), per-recipient
  sealing keyed by SHA256 SSH fingerprint, set/get/delete, and offline
  VerifyConsistency for lint.
- Security bindings (P0-3): each value's AAD is domain-separated scope\x00key, so
  a ciphertext cannot be transplanted to another scope or key; LoadScopeFile
  rejects a file whose in-name scope != filename scope (rename attack).
- Tests: RSA + ed25519 round-trip (incl multi-chunk), multi-recipient, non-
  recipient rejection, scope+key transplant resistance, rename detection,
  fail-closed version guard, consistency drift, governance, name validation.

Whole-file store, CLI wiring, and deploy-time resolution are unchanged in this
commit and follow next.

Signed-off-by: Dmitrii Creed <creeed22@gmail.com>
The RFC named SOPS, but the implementation reuses sc's own ciphers (no
getsops/sops or filippo.io/age dependency). Update the spec's crypto/format
layer, recipient type (SSH keys, not native age), integrity model (AEAD/OAEP
scope\x00key binding + filename/scope check, not SOPS MAC), lint gates, and
compatibility section to match what landed. Status -> IN PROGRESS.

Signed-off-by: Dmitrii Creed <creeed22@gmail.com>
…ow/lint/doctor)

Adds the `sc secrets scope` command group over the scoped store, namespaced so
it never collides with the whole-file store's existing allow/disallow/add/etc.

- set (value arg or stdin), get (ambient key), list, delete
- allow/disallow: update .sc/scopes.yaml then reseal every scope file of that
  scope to the new recipient set (Reencrypt: decrypt-with-current-key then
  re-seal, all-or-nothing); disallow prints the mandatory rotate-values warning
  since removal does not rewrite git history
- lint: per-file VerifyConsistency + recipient set == scopes.yaml (drift fails)
  + scope/filename binding — the CI gate
- doctor: which scopes the ambient key can open
- set refuses to write into a file whose recipients have drifted from scopes.yaml
  (reconcile via allow/disallow first), so a value is never sealed to a stale set
- package: ScopeFile.Reencrypt, path helpers (ScopesPath/StackDir/ScopeFilePath/
  ListScopeFiles), exported Fingerprint/SameRecipients; Save now MkdirAll's parents

Verified end-to-end on a real repo: allow->set(arg+stdin)->list->get(correct
plaintext)->lint OK->doctor YES; committed file is encrypted with readable
structure and zero plaintext leak. Package tests: 20 green.

Signed-off-by: Dmitrii Creed <creeed22@gmail.com>
…ini)

No P0s — both reviews confirmed the crypto core (transplant resistance,
backward-compat, per-file atomic reseal) is sound. Fixes for the CLI/lint findings:

- [Codex P1] scope get/doctor now resolve the decrypt key from --key-file, then
  SC_KEY_<SCOPE> / SC_SCOPE_KEY env, then the ambient config — so a pull_request
  scan job can hold ONLY SC_KEY_PR without a full SIMPLE_CONTAINER_CONFIG.
  Verified: get returns the value with only SC_KEY_PR set.
- [Gemini P1] reconcileRecipients (allow/disallow) is now two-phase: reseal every
  affected file in memory first, persist only after all succeed — a mid-way
  decrypt/parse failure can no longer leave files/scopes.yaml drifted.
- [Gemini P2] the private key is fetched lazily — declaring the first recipient of
  an empty scope needs no ambient key.
- [Codex P2] VerifyConsistency (lint) now base64-decodes every chunk and requires
  >= AEAD-tag length, rejecting a plaintext value smuggled under a valid recipient
  fingerprint — the offline plaintext-leak gate now has teeth.
- [Codex P2] Scopes.Allow rejects unsupported recipient key types (ECDSA, certs)
  that fingerprint but cannot be sealed to, failing fast at governance time.

Tests: 22 green (added plaintext-chunk + unsupported-key cases). Re-verified
end-to-end: allow->set->second-allow(reseal)->get-via-SC_KEY_PR->lint->ecdsa-reject.
Signed-off-by: Dmitrii Creed <creeed22@gmail.com>
Wires scoped secrets into the deploy read path (provisioner
readSecretsDescriptorFromFile): after loading a stack's whole-file values, merge
in every secrets.<scope>.yaml the AMBIENT KEY is a recipient of.

Design improvement over the RFC's secretScope: field — resolution is key-driven,
not config-driven, which DROPS P0-6 (PR-manipulable scope selection). A job
holding only SC_KEY_PR is a recipient of the pr scope alone, so it cannot decrypt
secrets.prod.yaml — the pull_request clamp is cryptographic, with no config to
subvert.

scoped.ResolveScopedValues semantics:
- no key / no scope files -> empty, no error (repos without scopes unaffected;
  the key is not even parsed). Verified: provisioner tests unchanged.
- scope files the key can't open are skipped (least privilege).
- whole-file store wins on conflict; scoped values only ADD new keys, so an
  existing ${secret:} resolution can never change.
- HARD-FAIL (P0-2) on: a value the key IS a recipient of but can't decrypt
  (tamper), a corrupt/renamed scope file, or a key present in two openable scopes
  (ambiguous). Not being a recipient is not an error.

secret-get and ${secret:} both see merged scoped values transparently. Spec
updated to the key-driven model. Tests: key-determines-scope (A sees pr not prod,
B sees prod not pr, non-recipient sees nothing) + cross-scope-dup fail; scoped +
provisioner suites green; whole module builds.

Signed-off-by: Dmitrii Creed <creeed22@gmail.com>
- [P1] Resolve scoped-only stacks: readSecretsDescriptor no longer bails on a
  missing legacy secrets.yaml before scope resolution — a stack with only
  secrets.<scope>.yaml now resolves (empty descriptor + scoped merge). If neither
  a legacy file nor any openable scoped value exists, it still returns the
  ignorable not-found (behavior preserved for secret-less stacks).
- [P1] Deploy path honors CI scope keys: ResolveScopedValues now takes multiple
  candidate keys, and the provisioner gathers the ambient config key PLUS
  SC_SCOPE_KEY and any SC_KEY_<SCOPE> (e.g. SC_KEY_PR) from the env — so a
  pull_request job holding only its scope key resolves scoped secrets via
  ${secret:} and sc stack secret-get, matching the CLI's key resolution.
- [P2] Integrity hard-fails are never swallowed: scoped resolver failures (tamper,
  corrupt/renamed file, ambiguous cross-scope key) are tagged scoped.ErrScopedIntegrity
  and ReadStacks propagates them even under IgnoreSecretsMissing — so a deploy can
  no longer proceed past a broken scope file just because the affected secret
  wasn't referenced. Not being a recipient remains a silent skip (least privilege).

Tests: added multi-candidate-key resolution + ErrScopedIntegrity assertions;
scoped + provisioner suites green; whole module builds.

Signed-off-by: Dmitrii Creed <creeed22@gmail.com>
Verifies the three Codex-review fixes end-to-end at the provisioner:
- scoped-only stack (no legacy secrets.yaml) resolves via a CI scope key from the
  environment (SC_SCOPE_KEY, no SIMPLE_CONTAINER_CONFIG) — P1a + P1b;
- a corrupt scope file surfaces as scoped.ErrScopedIntegrity even when unreferenced
  — P2 (never swallowed);
- a stack with neither legacy nor openable scoped secrets still reports plain
  not-found (os.ErrNotExist), preserving IgnoreSecretsMissing behavior.

Signed-off-by: Dmitrii Creed <creeed22@gmail.com>
…lleability) + P1s

Full multi-model panel (4 Claude lenses + Codex + Gemini). Two real P0s, both fixed
with acceptance tests; the panel's third P0 (ed25519 whole-file format changed) was a
false positive — main already used X25519 for ed25519, so the nil-AAD path is
byte-identical (now proven by TestAAD_NilRoundTrip_LegacyCompatible).

P0-A ed25519 downgrade attack (ciphers/encryption.go): DecryptLargeStringWithEd25519AAD
fell back to the legacy decryptWithEd25519 for non-X25519 blobs, which ignores AAD and
derives its key from public data — an attacker could forge a legacy blob with a chosen
plaintext, drop it in a scope file under a real recipient's fingerprint, and the deploy
would decrypt it, bypassing the scope/key binding. Now: refuse legacy blobs whenever an
AAD is supplied (the aad-less whole-file/migration path is unchanged).

P0-B RSA multi-chunk malleability (ciphers/encryption.go): every OAEP chunk used the
same label, so a hand-edited scope file could reorder/drop/splice chunks of an RSA
recipient's value and still decrypt to a permuted/truncated plaintext (lint didn't catch
it). Now: the OAEP label binds chunk index + count (chunkLabel), gated on aad != nil so
the legacy store stays byte-identical; empty-chunk-under-AAD is rejected.

P1s: Disallow no longer filters the recipient slice in place (backing-array aliasing);
scope/scopes files are written atomically (temp + rename, no torn writes that would
hard-fail deploys); disallowing the last recipient is refused with a clear error; the
provisioner's SC_KEY_* env scan is constrained to SC_KEY_<valid-scope> so an unrelated
env var is not tried as a decryption key.

Tests: TestAAD_MismatchFails (the core binding invariant, RSA + ed25519),
TestRSAChunkFraming_ReorderTruncateSpliceFails, TestEd25519Downgrade_RejectedUnderAAD,
TestAAD_NilRoundTrip_LegacyCompatible. All secrets + provisioner suites green.

Signed-off-by: Dmitrii Creed <creeed22@gmail.com>
… dup, gate, docs, tests)

Closes the should-fix items from the full review panel.

- Cross-stack transplant binding (Gemini P1): scope files now carry a Stack field,
  bound into every value's AAD (stack\0scope\0key) and verified against the parent
  directory on load. A ciphertext cannot be copied between stacks, scopes, or keys.
- Stronger offline gate (Security/Coherence/Codex P1/P2): VerifyConsistency checks
  each chunk against the recipient's key type via ciphers.ValidateCiphertextShape —
  an RSA chunk must be the modulus size, an ed25519 chunk must be an X25519 sealed box
  (magic + min length). Replaces the coarse 16-byte floor; a plaintext value smuggled
  under a fingerprint is rejected offline.
- lint duplicate detection (Codex/Gemini P2):  now flags a key present in
  two scopes of a stack, or in both a scope and the legacy secrets.yaml (mode A) —
  the 'one key, one mode' guarantee, caught before deploy instead of hard-failing there.
- Docs: fixed the spec CLI block (secrets scope <verb>, dropped nonexistent edit/
  updatekeys, --key -> --key-file) + integrity section (stack + chunk-index binding,
  downgrade + shape gates); added a user-facing 'Per-scope secrets' section to
  secrets-management.md; added a v1 supersession note to the keyless-secrets RFC README.
- Tests: real CLI harness (set/get/list/lint/doctor/disallow + reconcile-fail-closed +
  undeclared-scope refusal); SameRecipients; Delete; passphrase-protected-key rejection;
  RSA scope/key transplant; cross-stack transplant (ed25519 + RSA).

All secrets + cmd_secrets + provisioner suites green; module builds; verified end-to-end
(committed file carries stack:, no plaintext, CI-path get via SC_SCOPE_KEY, lint clean).

Signed-off-by: Dmitrii Creed <creeed22@gmail.com>
Round-2 architecture review (Codex + Gemini + Claude crypto lens) surfaced two real
weaknesses in direct-per-recipient value sealing, and 2 of 3 reviewers judged envelope
encryption materially better (not theater). Both weaknesses are structurally eliminated
by envelope, so switch to it while the format is still unshipped.

Weaknesses fixed:
- Cross-version RSA chunk splicing (Gemini): the OAEP chunk label bound index+count but
  no per-encryption id, so chunks from two commits of the same key (same aad/idx/count)
  could be mixed into a hybrid plaintext. Gone: values are no longer RSA-chunked.
- Per-recipient plaintext divergence (Claude lens): independent per-recipient seals let
  an actor with repo write + a recipient pubkey seal a DIFFERENT value into one
  recipient's slot (e.g. poison the prod deployer while local-decrypt review sees benign).
  Gone: one shared value ciphertext, so a tampered slot fails to decrypt, never diverges.

Design: each value is EncryptedValue{ciphertext, wraps} — the value is AEAD-encrypted
once under a random 32-byte DEK (ChaCha20-Poly1305, AAD = stack\0scope\0key via new
ciphers.SealAEAD/OpenAEAD/GenerateDEK), and the DEK is wrapped per recipient with the
existing RSA-OAEP / X25519 path (32 bytes = single block, so NO chunking remains). This
also deletes the whole RSA chunk-framing surface prior rounds kept patching, and makes
reseal a DEK-rewrap. No new dependency (age was the alternative but reverses the
no-new-crypto-deps stance); the whole-file store is untouched.

Tests: TestEnvelope_NoPerRecipientDivergence (tampered wrap -> decrypt failure, not a
chosen value); all prior transplant/downgrade/lint/CLI/provisioner suites updated to the
envelope shape and green. Verified end-to-end (file carries ciphertext+wraps, no
plaintext, CI-path get, lint clean). Spec updated to the envelope design.

Signed-off-by: Dmitrii Creed <creeed22@gmail.com>

@smecsia smecsia left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: scoped-store v1

Strong, security-conscious work — reviewed the full crypto/store/resolve/governance/CLI/provisioner code, built it, ran the secrets test suites (all pass), CI is 22/22 green. No blocking issues; a few minor notes inline.

The security design holds up (verified, not just asserted)

  • Envelope encryption: value AEAD-sealed once under a random DEK, DEK wrapped per recipient → no per-recipient divergence, no chunk-splicing.
  • AAD binds (stack, scope, key) into both the value AEAD and each DEK wrap → transplant resistance across scope/key/stack (covered by TransplantResistance_* tests).
  • Downgrade guard refuses legacy forgeable ed25519 blobs under an AAD binding (TestEd25519Downgrade_RejectedUnderAAD).
  • RSA-OAEP label binds chunk idx+count → reorder/drop/splice fails (defense-in-depth; the DEK is single-chunk anyway).
  • Deploy resolution is key-driven — the PR clamp is cryptographic, no config surface to subvert; hard-fails on tamper/ambiguity via ErrScopedIntegrity, correctly propagated past IgnoreSecretsMissing.
  • Fail-closed schema-version guards, filename↔scope + dir↔stack binding checks, all-or-nothing reseal, atomic writes (files before scopes.yaml), and aad=nil reproduces the legacy wire format byte-for-byte.

The test suite specifically targets the adversarial paths (transplant, splice, downgrade, cross-scope ambiguity, version fail-closed, plaintext-chunk lint, envelope no-divergence) — nice.

Non-blocking note on scope/title

The PR title (docs(secrets): … spec) undersells this — it's a full ~3k-line implementation, not a spec. Suggest renaming to feat(secrets): scoped-store v1 … before merge so history/changelog reflect that.

Questions

  1. Is sc secrets scope lint enforced as a required status check? A couple of the safety properties (no legacy/scope key collision, no cross-scope duplicates) are only caught by lint — see the inline note on the deploy-time merge.
  2. v1 ships age/SSH recipients only (KMS/OIDC = v2). Is there a rotation runbook for a leaked scope key beyond disallow + the "rotate values" warning (which correctly notes git history isn't rewritten)?

}
for k, v := range scopedVals {
if _, exists := desc.Values[k]; exists {
continue // legacy whole-file store wins; `sc secrets scope lint` forbids duplicates

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Silent legacy-shadows-scoped at deploy. When the same key exists in both the legacy whole-file store and an openable scope, the legacy value wins here silently — no error, no warning. This is the fail-safe direction (an attacker who can only write a scope file can't override a legacy secret), so it's not a vuln. But it's a foot-gun: an operator who sets a scoped value expecting it to take effect gets the legacy value with no signal.

sc secrets scope lint flags this collision — but only helps if lint is a required CI check. Consider (a) confirming lint is required, and/or (b) emitting a deploy-time warning when a scoped key is shadowed by legacy.

if err := os.MkdirAll(filepath.Dir(path), 0o755); err != nil {
return errors.Wrapf(err, "failed to create directory for %s", path)
}
tmp := path + ".tmp"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed path + ".tmp" temp name means two concurrent writers to the same scope file would collide on the temp file. Non-issue for serial CLI/CI use, but os.CreateTemp(filepath.Dir(path), …) would be more robust. (Only ciphertext ever hits disk, so no plaintext-leak concern here.)

if err != nil {
return "", errors.Wrap(err, "failed to read value from stdin")
}
return strings.TrimRight(string(data), "\n"), nil

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

strings.TrimRight(data, "\n") strips all trailing newlines from stdin. Piping a PEM/cert (sc secrets scope set TLS_KEY - < key.pem) silently drops its trailing newline, which can break exact-match consumers. Consider trimming a single trailing \n (e.g. strings.TrimSuffix), or documenting the behavior on the command help.

// public key (e.g. "SHA256:abc…"). It is used as the per-recipient map key inside a
// scope file: readable, order-independent, and derivable from a private key so a
// decryptor can find its own slot.
func recipientFingerprint(authorizedKey string) (string, error) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: recipientFingerprint parses via ssh.ParseAuthorizedKey, while the wrap/validate paths (validateEncryptableRecipient, encryptForRecipients) parse the same key via ciphers.ParsePublicKey(secrets.TrimPubKey(...)). Two parse paths for one key — they agree for standard ssh-rsa/ssh-ed25519 keys, so this is just a "two sources of truth" robustness note, not a bug.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants